attention schema
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
Saxena, Krati, Ruiz, Federico Jurado, Manzi, Guido, Liu, Dianbo, Lamb, Alex
Attention mechanisms have become integral in AI, significantly enhancing model performance and scalability by drawing inspiration from human cognition. Concurrently, the Attention Schema Theory (AST) in cognitive science posits that individuals manage their attention by creating a model of the attention itself, effectively allocating cognitive resources. Inspired by AST, we introduce ASAC (Attention Schema-based Attention Control), which integrates the attention schema concept into artificial neural networks. Our initial experiments focused on embedding the ASAC module within transformer architectures. This module employs a Vector-Quantized Variational AutoEncoder (VQVAE) as both an attention abstractor and controller, facilitating precise attention management. By explicitly modeling attention allocation, our approach aims to enhance system efficiency. We demonstrate ASAC's effectiveness in both the vision and NLP domains, highlighting its ability to improve classification accuracy and expedite the learning process. Our experiments with vision transformers across various datasets illustrate that the attention controller not only boosts classification accuracy but also accelerates learning. Furthermore, we have demonstrated the model's robustness and generalization capabilities across noisy and out-of-distribution datasets. In addition, we have showcased improved performance in multi-task settings. Quick experiments reveal that the attention schema-based module enhances resilience to adversarial attacks, optimizes attention to improve learning efficiency, and facilitates effective transfer learning and learning from fewer examples. These promising results establish a connection between cognitive science and machine learning, shedding light on the efficient utilization of attention mechanisms in AI systems.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > China (0.04)
- Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.68)
- Information Technology (0.67)
Improving How Agents Cooperate: Attention Schemas in Artificial Neural Networks
Farrell, Kathryn T., Ziman, Kirsten, Graziano, Michael S. A.
Growing evidence suggests that the brain uses an "attention schema" to monitor, predict, and help control attention. It has also been suggested that an attention schema improves social intelligence by allowing one person to better predict another. Given their potential advantages, attention schemas have been increasingly tested in machine learning. Here we test small deep learning networks to determine how the addition of an attention schema may affect performance on a range of tasks. First, we found that an agent with an attention schema is better at judging or categorizing the attention states of other agents. Second, we found that an agent with an attention schema develops a pattern of attention that is easier for other agents to judge and categorize. Third, we found that in a joint task where two agents paint a scene together and must predict each other's behavior for best performance, adding an attention schema improves that performance. Finally, we find that the performance improvements caused by an attention schema are not a non-specific result of an increase in network complexity. Not all performance, on all tasks, is improved. Instead, improvement is specific to "social" tasks involving judging, categorizing, or predicting the attention of other agents. These results suggest that an attention schema may be useful in machine learning for improving cooperativity and social behavior.
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Switzerland (0.04)
Attention Schema in Neural Agents
Liu, Dianbo, Bolotta, Samuele, Zhu, He, Bengio, Yoshua, Dumas, Guillaume
Attention has become a common ingredient in deep learning architectures. It adds a dynamical selection of information on top of the static selection of information supported by weights. In the same way, we can imagine a higher-order informational filter built on top of attention: an Attention Schema (AS), namely, a descriptive and predictive model of attention. In cognitive neuroscience, Attention Schema Theory (AST) supports this idea of distinguishing attention from AS. A strong prediction of this theory is that an agent can use its own AS to also infer the states of other agents' attention and consequently enhance coordination with other agents. As such, multi-agent reinforcement learning would be an ideal setting to experimentally test the validity of AST. We explore different ways in which attention and AS interact with each other. Our preliminary results indicate that agents that implement the AS as a recurrent internal control achieve the best performance. In general, these exploratory experiments suggest that equipping artificial agents with a model of attention can enhance their social intelligence.
- North America > Canada > Quebec > Montreal (0.15)
- North America > United States (0.14)
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- Education (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)